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Abstract —In the light of regularized dynamic time warping kernels, this 
paper re-considers the concept of time elastic centroid (TEC) for a set 
of time series. From this perspective, we show that TEC can be readily 
addressed as a preimage problem. Flowever, this non-convex problem 
is ill-posed, and obtaining a sub-optimal solution may involve heavy 
computational costs, especially for long time series. We then derive 
two new algorithms based on a probabilistic interpretation of kernel 
alignment matrices that expresses the result in terms of probabilistic dis¬ 
tributions over sets of alignment paths. The first algorithm is an agglom- 
erative iterative heuristic procedure inspired from a state-of-the-art DTW 
barycentre averaging algorithm. The second proposed algorithm uses a 
progressive agglomerative heuristic method to perform classical aver¬ 
aging of the aligned samples but also averages the times of occurrence 
of the aligned samples. By comparing classification accuracies for 45 
time series datasets obtained by first nearest centroid/medoid classifiers 
we show that: i) centroid-based approaches significantly outperform 
medoid-based approaches, ii) for the considered datasets, the second 
algorithm which combines averaging in the sample space and along the 
time axes, emerges as the most significantly robust heuristic model for 
time-elastic averaging with a promising noise reduction capability. 

Index Terms —Time series averaging Time elastic kernel Dynamic Time 
Warping Time series clustering and classification. 
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1 Introduction 

Since Maurice Frechet's pioneering work fl] in the early 
1900s, time-elastic matching of time series or symbolic 
sequences has attracted much attention from the scien¬ 
tific community in numerous fields such as information 
indexing and retrieval, pattern analysis, extraction and 
recognition, data mining, etc. This approach has im¬ 
pacted a very wide spectrum of applications relating 
to a multitude of socio-economic issues such as the 
environment, industry, health, energy, defense and so on. 

Among other time elastic measures. Dynamic Time 
Warping (DTW) was widely popularized during the 
1970s with the advent of speech recognition systems 
E), Q and numerous variants that have since been 
proposed to match time series with a certain degree of 
time distortion tolerance. 

The main issue addressed here is time series or shape 
averaging in the context of a time elastic distance. This is 
a long-standing issue that is currently becoming increas¬ 
ingly prevalent; it is relevant for summarizing subsets 
of time series, defining significant prototypes, identify¬ 
ing outliers, performing data mining tasks (mainly ex¬ 
ploratory data analysis such as clustering) and speeding 
up classification, as well as regression or data analysis 
processes in a big data context. 

In this paper, we specifically tackle the question of av¬ 
eraging subsets of time series, not from considering the 
DTW measure itself as has been already largely explored, 
but from the perspective of the so-called regularized 
DTW kernel (KDTW) that ensures positive definiteness. 
From this new perspective, the estimation of a time 
series average or centroid can be readily addressed as 
a preimage (inverse) problem. However, this approach 
has some theoretical and practical limitation that are 
discussed in the following sections. A more promising 
direct approach is developed here, which is based on a 
probabilistic interpretation of kernel alignment matrices, 
allowing a precise definition of the average of a pair of 
time series from the expected value of local alignments 
of samples. The tests carried out so far demonstrate the 
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robustness and the efficiency of this approach compari¬ 
son to the state-of-the art approach. 

The structure of this paper is as follows: after an 
introduction, the second section summarizes the most 
relevant related studies on time series averaging as well 
as DTW kernelization. In the third section, we show 
how the estimation of a time-elastic centroid can be 
addressed as a preimage problem in the context of the 
DTW regularized kernel (KDTW). In the fourth section, 
we derive a probabilistic interpretation from the kernel 
alignment matrices evaluated on a pair of time series. In 
the fifth section, we define the average of a pair of time 
series, and based on this pairwise averaging procedure, 
we propose two sub-optimal algorithms designed for the 
averaging of any subset of time series. 

2 Related works 

Time series averaging in the context of (multiple) time 
elastic distance alignments has been mainly addressed 
in the scope of the Dynamic Time Warping (DTW) 
measure [2j, J3). Although other time elastic distance 
measures such as the Edit Distance With Real Penalty 
(ERP) (H or the Time Warp Edit Distance (TWED) 0 
could be considered instead, without loss of generality, 
we remain focused throughout this paper on DTW and 
its kernelization. 

2.1 DTW and time elastic centroid of a pair of time 
series 

A classical formulation of DTW can be given as follows. 
If d is a fixed positive integer, we define a time series of 
length T as a multidimensional sequence v = v(i), such 
that, Vi € {1,.., T}, v(i) G R d . 

Definition 2.1: If u and v are two time series with 
respective lengths Tf and T 2 , an alignment path it = (717,) 
of length p = |7r| between u and u is represented by a 
sequence 

7 r:{l,...,p}->{l,...,T 1 }x{l,...,T 2 } 

such that 7Ti = (1,1), 7q> = (Ti,T 2 ), and (using the 
notation Tr k = (i k ,j k ), for all k G {1 1}, 

TTfc+i = ( 4 + 1 ,4+i) £ {(4 + 1,4), (4,4 + 1 ), 
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(*fc + 1) jk + 1)}- 

We define Vfc 7Tfc(l) = ik and nk(2) = jk, as the index 
access functions at step k of the mapped elements in 
the pair of aligned time series. 

In other words, a warping path defines a way to travel 
along both time series simultaneously from beginning to 
end; it cannot skip a point, but it can advance one time 
step along one series without advancing along the other, 
thereby justifying the term time-warping. 

If <5 is a distance on K. d , the global cost of a warping 
path 7r is the sum of distances (or squared distances or 
local costs) between pairwise elements of the two time 
series along 7r, i.e.: 

cost(7r) = S ( v ik’ w j k ) 

(.ik,jk)£ir 

A common choice of distance on M. d is the one generated 
by the L 2 norm: 

d 

S(x, y) = ||x — 2 /||l = 5^(x z - yi) 2 . 

i=i 

Definition 2.2: For a finite time series, any warping 
path has a finite length, and thus the number of exist¬ 
ing warping paths is finite. Hence, there exists at least 
one path 7r* whose cost is minimal, so we can define 
DTW(«,r) as the minimal cost taken over all existing 
warping paths. Hence 

DTW(«,r) = min cost(7r(u, v)) = cost(n*(u,v)). (1) 

7r 

Definition 2.3: From the DTW measure, it is 
straightforward to define the time elastic centroid c(u, v ) 
of a pair of time series u and v as the time series (c/ ; .) 
whose elements are Ck = Centroid(u(7r^.(l)),v(7r^(2)), 
V/;:: € 1, - ■ - , |7r* |, where Centroid corresponds to the 
usual definition in Euclidean space. 


2.2 Time elastic centroid of a set of time series 

A single alignment path is required to calculate the 
time elastic centroid of a pair of time series (Def. 12.311 . 
However, multiple path alignments need to be consid¬ 
ered to evaluate the centroid of a larger set of time 
series. Multiple alignments have been widely studied 
in bioinformatics 0, and it has been shown that the 
computational complexity of determining the optimal 
alignment of a set of sequences under the sum of all 
pairs (SP) score scheme is a NP-complete problem (7j 
l8l . The time and space complexity of this problem 
is 0(L k ), where k is the number of sequences in the 
set and L is the length of the sequences when using 
dynamic programming to search for an optimal solution 
|9|. This latter result applies to the estimation of the time 
elastic centroid of a set of k time series with respect 
to the DTW measure. Since the search for an optimal 


solution becomes rapidly intractable with increasing k, 
sub-optimal heuristic solutions have been subsequently 
proposed, most of them falling into one of the following 
three categories. 

2.2.1 Progressive heuristics 

Progressive heuristic methods estimate the time elastic 
centroid of a set of k time series by combining pairwise 
centroids (Def. 12.31 . This kind of approach constructs a 
binary tree whose leaves correspond to the time series 
of the data set, and whose nodes correspond to the 
calculation of a local pairwise centroid, such that, when 
the tree is complete, the root is associated with the esti¬ 
mated data set centroid. The proposed strategies differ 
in the way the tree is constructed. One popular approach 
consists of providing a random order for the leaves, and 
then constructing the binary tree up to the root using this 
ordering I TOl . Another approach involves constructing a 
dendrogram (a hierarchical ascendant clustering) from 
the data set and then using this dendrogram to calculate 
pairwise centroids starting with the closest pairs of 
time series and progressively aggregating series that are 
farther away (TTI as illustrated on the left of Fig. |T| 
Note that these heuristic methods are entirely based on 
the calculation of a pairwise centroid, so they do not 
explicitly require the evaluation of a DTW centroid for 
more than two time series. Their degree of complexity 
varies linearly with the number of time series in the data 
set. 

2.2.2 Iterative heuristics 

Iterative heuristics are based on an iterated three-step 
process. For a given temporary centroid candidate, the 
first step consists of calculating the inertia, i.e. the sum 
of the DTW distances between the temporary centroid 
and each time series in the data set. The second step 
evaluates the best pairwise alignment with the tempo¬ 
rary centroid for each time series uj(i) in the data set 
(j G {1 •■•?!}). A new time series &j(i) is thus con¬ 
structed that contains all the samples of time series Uj(i), 
but with time being stretched or compressed accord¬ 
ing to the best alignment path. The third step consists 
of producing a new temporary centroid candidate c(i) 
from the set {uj(i)} by successively averaging (in the 
sense of the Euclidean centroid), the samples at every 
timestamp i of the Uj(i) time series. Basically, c(i) = 
V u l (i)A\[i.j) V,l(i,i)), where 1 (i,j) is an 
indicator function equal to 1 if time series iij is defined 
for timestamp i, but which is otherwise 0. 

Thus, the new centroid candidate replaces the previ¬ 
ous one and the process is iterated until the inertia is no 
longer reduced or the maximum number of iterations is 
reached. Generally, the first temporary centroid candi¬ 
date is taken as the DTW medoid of the considered data 
set. This process is illustrated on the right of Fig. [l] The 
three steps of this heuristic method were first proposed 
in m- The iterative aspect of this heuristic approach 
was initially introduced by Ifl3l and refined by fl4l . 
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Fig. 1. Progressive hierarchical with similar first agglomeration (left) v.s. iterative agglomeration (right) strategies. 
Final centroid approximations are presented in red bold color. Temporary estimations are presented using a bold 
dotted black line 


Note that, in contrast to the progressive method, this 
kind of approach needs to evaluate, at each iteration, all 
the alignments with the current centroid candidate. The 
complexity of the iterative approach is higher than the 
progressive approach, the extra computational cost being 
linear with the number of iterations. More sophisticated 
approaches have been proposed to escape some local 
minima. m have evaluated a genetic algorithm for 
managing a population of centroid candidates, thus im¬ 
proving with some success the straightforward iterative 
heuristic methods. 

2.2.3 Optimization approaches 

Given the entire set of time series § and a subset of n time 
series S = {uj}j= \... n C S, optimization approaches at¬ 
tempt to estimate the centroid of S from the definition of 
an optimization problem, which is generally expressed 
by Eq. [2] given below: 

n 

c = argmin DTW( s,Uj ) (2) 

3 =1 

To our knowledge, the first attempt to use this kind of 
direct approach for the estimation of time elastic centroid 
estimation was recently described in fl6l . 

These authors (op.cit.) derived a solution of their 
original non-convex constrained optimization problem, 
by integrating a temporal weighting of local sample 
alignments to highlight the temporal region of interest in 
a time series data set, thus penalizing the other temporal 
regions. Two time elastic measures were specifically 
addressed: i) a dynamic time warping measure between 
a time series and a weighted time series (representing 


the centroid estimate) and ii) an (indefinite) kernel DTW 
called DTAK fl7| . Their results are very promising: 
although the number of parameters to optimize is linear 
with the size and the dimensionality of the time series, 
the two steps gradient-based optimization process they 
derived is very computationally efficient and shown to 
outperform the state of the art approaches on some 
challenging scalar and multivariate data sets. However, 
as numerous local optima exist in practice, the method is 
not guaranteed to converge toward the best possible cen¬ 
troid, which is anyway the case in all other approaches. 

2.3 Discussion and motivation 

According to the state of the art in time elastic cen¬ 
troid estimation, an exact centroid, if it exists, can be 
calculated by solving a NP-complete problem whose 
complexity is exponential with the number of time series 
to be averaged. Heuristic methods with increasing time 
complexity have been proposed since the early 2000s. 
Simple pairwise progressive aggregation is a less com¬ 
plex approach, but which suffers from its dependence 
on initial conditions. Iterative aggregation is reputed to 
be more efficient, but entails a higher computational 
cost. It could be combined with ensemble methods or 
soft optimization such as genetic algorithms. The non- 
convex optimization approach has the merit of directly 
addressing the mathematical formulation of the centroid 
problem in a time elastic distance context. This approach 
nevertheless involves a higher complexity and must deal 
with a relatively large set of parameters to be optimized 
(the weights and the sample of the centroid). Its scalabil¬ 
ity could be questioned, specifically for high dimensional 
multivariate time series. 
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It should also be mentioned that some criticisms of 
these heuristic methods have been made in |18l . Among 
other drawbacks, the fact that DTW is not a metric 
(the triangle inequality is not satisfied) could explain 
the occurrence of unwanted behaviour such as centroid 
drift outside the time series cluster to be averaged. We 
should also be borne in mind that keeping a single best 
alignment (even though several may exist, without men¬ 
tioning the good ones) can increase the dependence of the 
solution on the initial conditions. It may also increase 
the aggregating order of the time series proposed by the 
chosen method, or potentially enhance the convergence 
rate. 

In this study, we do not directly address the is¬ 
sue of time elastic centroid estimation from the DTW 
perspective, but rather from the point of view of the 
regularized dynamic time warping kernel (KDTW) [19). 
This perspective allows us to consider centroid estima¬ 
tion as a preimage problem, which is in itself another 
optimization perspective. More importantly, the KDTW 
alignment matrices can be used to derive a probabilistic 
interpretation of the pairwise alignment of time series. 
This leads us to propose a robust interpolation scheme 
jointly along the time axis and in the sample space. We 
do not claim that using KDTW and its probabilistic in¬ 
terpretation can solve all or even any of the fundamental 
questions raised earlier: since the problem tackled here 
is NP-complete, an exact solution requires exponentially 
complex computations and any heuristic method must 
handle numerous local minima. Our aim is to throw 
some new light on the problem as well as obtain new 
quantitative results showing, in this difficult context, that 
the proposed alternative approach is worth considering. 

2.4 Time elastic kernels and their regularization 

Dynamic Time Warping (DTW), (2), (3 as defined in 
Eq|T| can be recursively evaluated as 

d d tw{X p ,Y q ) = d 2 E {x(p),y(q)) (3) 

{ dd,tw(Xp—i 1 Yq') sup 

ddt w {X p -i,Y q -i) sub 
ddt.v; {Xp - Y q _i ) ins 

where d E {x(p),y(q) is the Euclidean distance (eventually, 
the square of the Euclidean distance) defined on R fc 
between the two positions/?points in sequences X and 
Y taken at times p and q, respectively. 

Apart from the fact that the triangular inequality does 
not hold for the DTW distance measure, it is furthermore 
not possible to define a positive definite kernel directly 
from this distance. Hence, the optimization problem, 
which is inherent to the learning of a kernel machine, is 
no longer quadratic and, at least for some tasks, could 
be a source of limitation. 

Regularized DTW: recent studies (20), fl9l lead us 
to propose new guidelines to ensure that kernels con¬ 
structed from elastic measures such as DTW are positive 


definite. A simple instance of such a regularized kernel, 
derived from m, can be expressed in the following 
form, which makes use of two recursive terms: 

KDTW(X P , Y q ) = K*yjX p , Y q ) + K% W (X P , Y q ) 

KdL(X p ,Y q ) = 

{ h{p - 1, q)K^ w (X p _ i, Y q ) 

{ Hp,q-i)Kd? w (x p , r,_i) 

K x d L{Xp,Y q ) = i 

[ h{p - 1, q)K% w {X^ u Y q )e-'’**(*V>,vW) 

E < A P , q h(p, q)K% a (X p - 1 ,Y q - 1 )e-'**W*M 

( hip, q - 1 )Kg*,(Xp, Yq^e-^M’yW 

(4) 

where is the Kronecker symbol, v £ M + is a stiffness 
parameter which weights the local contributions, i.e. the 
distances between locally aligned positions, and d E (-, ■) 
is a distance defined on M. fc . 

The initialization is simply KfffXo, Yq) = 
K x d fJX 0 ,Y 0 ) = 1. 

The main idea behind this regularization is to 
replace the operators min and max (which prevent 
symmetrization of the kernel) by a summation operator 
(E)- This allows us to consider the best possible 
alignment, as well as all the best (or nearly the best) 
paths by summing their overall cost. The parameter 
v is used to check what is termed as nearly-the-best 
alignment, thus penalizing alignments that are too far 
away from the optimal ones. This parameter can be 
easily optimized through a cross-validation. 

3 KDTW CENTROID AS A PREIMAGE PROB¬ 
LEM 

In this section, we tackle the centroid estimation 
question from a kernelized centroid point of view, the 
kernel of interest being KDTW. 

The Moore-Aronszajn theorem t2TI establishes that a 
reproducing kernel Hilbert space (RKHS) exists uniquely 
for every positive definite kernel and vice-versa. Let H 
be the RKHS associated to kernel n defined on a set 
X, and let (.,.)« be the inner product defined on T~L. 
In addition, the representer property of the evaluation 
functional in H is expressed as: for any f> £ H and any 

Xj £ ff, 1p(Xj) = 

Denoting </>(.) as the map that assigns the kernel 
function «;(.. x) to each input x £ X, the reproducing 
property of the kernel implies that for any {x,, x 3 ) £ X 2 , 
n{xi,Xj) = {cj){ah),f{xj))n- 

Furthermore, D- H {x i ,Xj) 2 = Wfixf) — f{xj )||^ = 
{<f>(xi),<j>{xi))-H + (fixj), (t>ixj)) n - 2.{f{x. i ),(j){x ) ) n is the 

generalization of the squared Euclidean distance defined 
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in the feature space TL, which can be expressed in kernel 
terms as: D^Xi.Xj) 2 = n(xi, Xi) 2 +n(xj, Xj) 2 — 2.K(xi, Xj) 

(the so-called kernel trick). 

Finally, the representer theorem Il22l states that any 
function <p(.)* of a RKHS 'H minimizing a regularized 
cost functional of the form: 

n 

E +g{\Mn) 

2=1 

with predicted output ip(xi) for input x, and desired 
output yj, where g(.) is a strictly monotonically increas¬ 
ing function on R+-, is equivalent to a kernel expansion 
expressed in terms of available data ({(xi,yi)}) 

n 

<P%) = E'Y iK ( x< >.), where Vi, 7 * S R. (5) 
2=1 

Hence, a direct definition of the kernelized centroid of 
the set {xi,i = l..n} expressed in the RKHS 'H feature 
space associated with kernel k can be written as: 

n 

<p*(.) = arg min V||<p(.)(6) 

n 

— ar g min n- ||<p(-)llw - 2 • E^(')’ 

The representer theorem applies and thus <p*(.) takes 
the form given in Eq. [5) which allows us to rewrite Eq. 
[ 6 ] as follows: 

n n 

(£>*(.)= arg min EE liljK{Xi,Xj) 

{ % }*—T n 
n n 

-^EE^’ 1 ^ ( y ) 

2=1 j=1 

Unfortunately, if the kernelized centroid is related to 
a well-defined quadratic optimization problem in the 
RKHS space 'H (Eq. 0, it is an ill-posed problem in 
set X. This is known as the preimage problem, since 
the pre-image of </>(.)* might not exist. Instead, we are 
seeking the best approximation, namely x* € X whose 
map (j)(x*) = k(.,x*) is as close as possible to <p(.)*, as 
illustrated in Fig|2] 

Hence, if we remove the term that does depend upon 
x, the optimization problem becomes: 

n 

x* = argminn • ||rc(.,a;)||^ - 2 • V(k(., x), k(., Xj)) n 

3 =1 
n 

= argminn • k(x, x) — 2 • k(x, xj) (8) 

x&X j =1 

For KDTW, the non-convex optimization problem 
cannot be straightforwardly addressed using gradient- 
based approaches mainly because the derivative cannot 
be determined analytically. Moreover, the number of 



Fig. 2. Centroid estimation viewed as a preimage prob¬ 
lem. 



Fig. 3. Centroid estimation for the three categories con¬ 
tained in the CBF dataset as a solution of the preimage 
problem. In bold, the centroid time series; in light red, the 
time series of the averaged dataset. At top left of figure, 
the value of the minimized functional is plotted on a log- 
scale versus the iteration index. 


variables (linear with the length of the time series and 
with the dimensionality of each sample) is generally 
high so this approach often encounters combinatorial 
difficulties related to the number of local minima. A 
derivative-free method could nevertheless be applied for 
local modelling of the functional to be optimized. In an 
attempt to carry out such a preimage formulation to 
estimate the time elastic centroid for a set of time se¬ 
ries, we applied the state-of-the-art BOBYQA algorithm 
developed for bound constrained optimization without 
using derivatives I l23l . Fig[5] and Fig[4] give the centroid 
estimations for each category of the CBF and Trace 
datasets, respectively [J24J. On the top left diagram of 
the figures, the values of the function to be minimized 
are plotted against the number of iterations. The op¬ 
timization process is initialized using the medoid for 
each category. We show that the required number of 
iterations is quite high and depends on the number of 
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Fig. 4. Centroid estimations of the first three categories 
(out of four) contained in the Trace dataset as a solution of 
the preimage problem. In bold blue, the centroid time se¬ 
ries; in light red, the time series of the averaged dataset. 
Top left diagram of figure shows value of the minimized 
functional expressed on a log-scale, plotted against the 
iteration index. 


i.e. the product along the path of the local alignment 
costs. We can give a probabilistic interpretation of these 
local costs exp(—vd%(X(nk(l)),Y(TTk(2)))): basically, we 
can assume that these local costs correspond (within the 
magnitude of the scalar multiplication constant) to the 
local a priori probability of aligning sample X(TTk(l)) 
with sample Y{ 7Tfc(2)). By making this assumption, we 
eventually attach a probability distribution to the set 
of all alignment paths, with the cost(n) corresponding 
(within the magnitude of the scalar multiplication con¬ 
stant) to the probability attached to alignment path 7 r. 

Hence, the cell (i,j) of matrix AM, contains the sum 
of the probabilities (within the magnitude of the scalar 
multiplication constant) of the paths that connect cell 
(1,1) to cell (i,j). 

Similarly, if, instead of X and Y, we evaluate the 
similarity between X r and Y r derived from X and Y by 
reversing the temporal index, we obtain an alignment 
matrix AM r whose cell (i,j) contains the sum of the 
probabilities (to within a multiplicative scalar constant) 
of the paths that connect cell (p, q) tp cell (i, j). 

Finally, multiplying properly cells of AM with cells 
of AM r yields the Alignment Matrix Average [AMA) 
defined as: 


variables. For the CBF dataset, the time series is made up 
of 128 samples while there are 275 samples for the Trace 
dataset. The convergence rate is roughly ten times slower 
for the Trace data set compared with the CBF dataset, 
mainly because KDTW complexity is quadratic with the 
length of the time series. The iteration cost becomes 
somewhat prohibitive for long time series or large time 
series datasets. Although this approach could be possibly 
optimized, the parameters need to be carefully set up 
(basically, definition of the trust region) and, in any case, 
as stated above, the optimum so provided remains an 
estimation of the centroid that is sought. Finally, note 
that the functional starts to decrease after attaining the 
number of iterations (in this case, twice the length of the 
time series) initially required for local estimation of the 
functional. 

4 Probabilistic interpretation of time 

ELASTIC KERNEL ALIGNMENT MATRICES 

In this section, we consider the recursive term K ^ w (.,.) 
that is used in Eq. 0] When evaluating the similarity 
between two time series X p and Y q with respective 
lengths of p and q, this recursion allows the construction 
of an alignment matrix AM(i,j) with i £ {1 ■ ■ -p} and 
i £ {1 ■■■q}- The cell at location (i,j) contains the 
summation of the global costs of all alignment paths, 
as defined in definition 12.11 that connect cell (1,1) with 
cell (i, j). For any alignment path 7r, the global cost is 
expressed as: 

M 

costfjr) = IK" 4 (X(n k (l)),y(w k ( 2 ))) (9) 

fc=1 


AMA(i,j) = AM(i,j) ■ AM r (p -i + l,q-j + l) (10) 


and whose cell (*, j) contains the sum of the probabili¬ 
ties (upto the normalization constant) of the paths that 
connect cell (1,1) to cell (p, q) while going through the 
cell 

From this path probability distribution, we can now 
derive an alignment probability distribution between the 
samples of X and the samples of Y as follows: 


• For all i, the probability of aligning sample X (/,) is 
P(i) = 1; all samples need to be aligned. 

• Similarly, for all j, the probability of aligning sample 

Y(j) is P(j) = l. 

• The probability of aligning sample X(i) with sample 
Y(j) is P(i,j) = P(i\j).P(j) = P(i\j). P[i\j) is the 
probability that sample X(i) is aligned with sample 
Y(j) given that the alignment process is in state j. 
The estimation of P(i\j) is obtained by using matrix 
AM A: 


pm 


AMA(i,j) 

V; , AMA{i,j) 


m Furthermore, the probability of aligning sample 
X(i) with sample Y(j) is also P(i,j) = P(J\i).P(i) = 
P(j\i). Similarly, the estimation of P(j\i) is obtained 
by using matrix AM A: 




AMA(i,j) 

£' ] =1 AMA(i,j ) 


( 11 ) 


Note that the normalization constant mentioned above 
is eliminated. 

Since P(i,j) = P(i\j) = P{j\i), we can finally estimate 
the probability of aligning sample X(i) with sample Y (j) 




































as follows: 


1 f AMA(i,j) AMA(i,j) \ 

[hJ) 2 ' 1 ELi AMA(i,j) + EU AMA(i,j) J 

( 12 ) 

Eq. [12] forms the basis of our pairwise time elastic time 
series averaging algorithm given below. 

As an example. Fig [5] presents the AMA matrix corre¬ 
sponding to the alignment of a positive halfwave with 
a sinus wave. The three potential alignment pathes are 
clearly identified in the light blue and red colors. 



Fig. 5. AMA matrix for the alignment of a positive 
halfwave with a sinus wave. 


5 Time elastic centroid based on the 

AMA ALIGNMENT MATRIX 

Based on the structure of the KDTW kernel and the 
AMA matrix, and by using the so-called DtwBarycenter 
Averaging (DBA) method developed by lfl2l . fl4] . fl3| , 
we first present the KernelDtwNarycenter Averaging 
(KDBA) algorithm for estimating a time elastic centroid 
for a set of time series according to an iterative agglomer- 
ative procedure as shown in Fig. Ilbl Secondly, we detail 
the concept of a time elastic average for a pair of time 
series (KDTW-PWA), and then develop the progressive 
heuristic approach presented in Fig. [la] that uses KDTW- 
PWA to estimate another kind of time elastic centroid 
(KDTW-C1) for a set of time series of any cardinal. 

5.1 KDTW-Centroid of a set of time series based on 
KDBA algorithm 

Following the DBA algorithmic approach fl2l . fl3l . we 
present here the development of our kernelized version 
called KDBA. KDBA directly applies the definition of the 
alignment matrix average (AMA) as given in EqflOl and 
its probabilistic interpretation Eqll2l 

Let us consider a set S of N time series, S = 
{Si,S 2 , ■ • • , Sn}, and R a reference time series. Let \R\ 
and |S n | be the lengths of R and S n , respectively. P n {i,j), 
with i = 1{1, ...|5„|} and j = 1{1, ...|i?|}, is obtained from 
the AMA matrix resulting from the alignment of S n with 
R, according to Eq|T2] Algorithm [l] computes an average 
time series A according to the following equation: 

1 N |S„| 

V* G {!,■■• ,M}, A (*) = (13) 

n=l j=l 


Algorithm 1 KDBA 
l: procedure KDBA(i?,S', v) 

2: // R: a reference time series 

3: // S: a set of time series {Si, • • • , Sn} 

4: / / u: the stiffness parameter of KDTW kernel 

5: Double AMA(.,.); 

6: Vector-Of-SetOfSamples SampleAssociations(L); 

7: Ts A(|f?|); //Create a D dimensional 

8: //time series of length L ; 

9: for Int i = 1 to | R | do SampleAssociations(i)={}; 

10: for Int n = 1 to |S| do 

11: Evaluate AMA matrix for R, S n with v; 

12: Ts ts/ /containing L "zeroed" samples; 

13: Double normFactor(\R\)-, 

14: for Int * = 1 to \R\ do 

15: normFactor(i)=0; 

16: for Int j = 1 to |S n | do 

17: ts(i) = ts(i ) + S n (j) * AMA(i,j ); 

18: nor mF actor (i) = nor mFactor (*)+ 

19: AMA(i,j ); 

20: ts(i) = tsi(i)/normF actor (i); 

21: SampleAssociations(i)=(fs(i)); 

22: for Int i = 1 to |i?| do 

23: A(i)=barycenter(SampleAssociations(i)); 

24: return A 


Algorithm 2 iKDBA 
l: procedure lKDBA(C,S, v) 

2: //C: a reference time series 

3: //S: a set of time series 

4: //maxlter: maximum number of iterations 

5: / /v. the stiffness parameter of KDTW kernel 

6: Ts A; //a D dimensional Timeseries 

7: Double inertia = computeInertia(C', S); 

8: Boolean Continue=True; 

9: Int i = 0; 

10: while Continue do 

ll: A=C; 

12: C=KDTW-C2(C,S, ;/); 

13: Double new_inertia = computelnertia (C, S '); 

14: if new_inertia > inertia OR i > maxlter then 

15: Continue = False; 

16: i=i+l; 

17: return A 


Note that the iterative average of time series produced 
by algorithm [l] has the same size as the reference time 
series R. 

The algorithm [l] can be refined by iterating until no 
further improvement is obtained fl4). An improvement 
is observed when the sum of the distances (resp. sim¬ 
ilarities) between the current average R and the new 
pairwise average provided by KDBA, A, is lowered 
(resp. increased). Algorithm [2] implements this iterative 
strategy, which will necessarily find a local minimum 
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or will stop when a maximum number of iterations has 5.2 KDTW average of a pair of time series (KDTW- 
been reached. PWA) 



Fig. 6. Expected time location for the centroid (in red 
circles) of two triangular-shaped time series shifted in 
time (in blue ’+’ and black ’*’) 


Algorithm 3 KDTW-PWA 


10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 


procedure KDTW-PWA(X,F, AM A) 

//X,Y: two time series of D dimensional samples 
/ /AMA: the average alignment matrix for X,Y 
Int p = \X\, q = \Y\, L = max{p, q}' 

Ts A(L), B(L); //Create 2 D dimensional 
/ / time series of length L ; 

Double a; 

Double N A (L),N B (L); //two double arrays 
for Int i = 1 to L do 
for d=l to D do 

A(i,d) = 0, B(i,d) = 0; 

N A {i) = 0, N B (i) = 0; 
for Int * = 1 to L do 
if i < p then 

for Int j = 1 to q do 

a = (i + j )/ 2 - L(*+j)/ 2 J; 

for d=l to D do 

A (Y(i + j)/2j , d)+ = 
a ■ ( X(i,d ) +Y(j,d)) ■ AMA(i,j); 

+ j)/2~|, d)+ = 

(1 - a) ■ ( X(i, d)+Y(j, d)) ■ AMA(i, j ); 

Aa(L(* + j)/2j)+ = a* AMA(i,j); 
N A (Ui+j)/2])+ = (1 — a) * AM A(i, j); 

if i < q then 

for Int j = 1 to p do 

a = (i + j)/ 2- L(*+j)/ 2 J; 

for d=l to D do 

B([(i + j)/2\,d)+ = 
a • ( X U, d ) + Y (i,d)) ■ AMA(j,i); 
B(\(i d)+ = 

(1 -a)-(X(j, d)+Y(i, d))-AMA(j, i); 

N B(l(i + j)/2])+ = a* AMA(j,i); 
N B (\(i + j)/2-})+ = (1 -a)*AMA(J,i); 

for Int i = 1 to L do 
for d=l to D do 

A(i,d) = (A(i,d)/N A (i) + B(i,d)/N B (i))/4; 

return A 


Similarly to DBA, the KDBA algorithm averages a set 
of time series in the sample space but not along the 
time axis. Basically, let us suppose we are averaging 
two triangular-shaped time series such as represented 
by the blue crosses and black dots on Fig l5.ll When 
using DBA or KDBA algorithms with one of the two time 
series acting as the reference, then the calculated average 
would be the reference distribution itself. However, we 
would also expect to average the time shift between the 
two series, thus obtaining the distribution indicated by 
the red dots in Fig.fig:time-shift. This is precisely our 
main motivation for the deriving the following Pair Wise 
Averaging (KDTW-PWA) algorithm designed to average 
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a pair of time series in the sample space but also along 
the time axis. 

Algorithm [3] provides the KDTW-PWA average (A) of 
the two time series X and Y according to Eql l4l 


Mk = l---L, A{k)= Y. ■ 

\ *+■?' — fc ' 

^2 f p m+hm *w+n j) 

*hl4 2=fc 


X(i) + Y(j) 



Fig. 7. Averaging triangular-shaped time series. On the 
left, the two time series (in blue) are identical (superim¬ 
posed) and the centroid (red) is amplified by a factor of 
two. On the right, the two time series (in blue) have the 
same shape but have been shifted in time. The KDTW- 
PWA average is given in red, still amplified by a factor 
of two. The corresponding (normalized) AMA alignment 
matrices are given at the bottom. 

As the time indices are considered discrete (integer 
values), the time averaging (i+j )/2 is smoothed between 
the floor and cell integer values, using the smoothing 
coefficient a (line 17 of the algorithm). 

Thus, the KDTW-PWA jointly averages the sample 
values of the two time series and their time locations. 
Eq. [14] allows us to interpret the centroid of a pair of 
time series as the mathematical expectation of aligning 
the two sequences of samples. 



Fig. 8. Centroid corresponding to the pairwise alignment 
for the sinus experiment depicted in Fig |5] 

As an example, the centroid corresponding to the 
pairwise alignment of the sinus experiment depicted in 


Fig[5]is presented in Figj8] Notice that in the centroid, the 
negative halfwaves of the sine wave have been filtered. 
This is because the negative halfwaves do not match 
with the positive halfwave that is aligned with the sine 
wave. 

In Fig 0 we present a very simple experiment that 
consists of averaging two identical triangular-shaped 
time series (on left of figure) and two time series with 
identical triangular shapes but shifted in time. At the 
bottom of the figure, the corresponding AMA matrices 
are presented. The KDTW-PWA distributions, presented 
in red, are multiplied by a factor of two to facilitate 
reading of the figure. We can see that, for both situations, 
the centroid is precisely located at the correct averaged 
time of occurrence of the two time series, whether or not 
they are shifted in time. The most likely alignment areas 
on the AMA matrices are shown in in red and the less 
likely alignment areas in blue. The time shift is clearly 
visible on the right-hand figure. 

5.3 KDTW-Centroid of a set of time series based on 
KDTW-PWA 


Algorithm 4 pKDTW-PWA 
l: procedure pKDTW-PWA(S, v) 

2: //S: a set of time series of D dimensional samples 

3: / / V. the stiffness parameter of KDTW kernel 

4: Ts A; / /a D dimensional time series 

5: SetOfTimeSeries So; 

6: while |S| > 1 do 

7: S 0 = 0 

8: while |S| > 1 do 

9: Let ts i, ts ‘2 the first two time series in S; 

10: Evaluate the AMA matrix for ts\ and ts 2 

ll: with v as the stiffness parameter 

12: A = KDTW-PWA(tsi, ts 2 , AMA); 

13: So = So U {A}; 

14: s = S \ {tsi, tS 2 }; 

15: S = So U S; 

16: Let A be the single element of S; 

17: return A 


To average a larger set of time series using the pairwise 
average KDTW-PWA, we simply adopt the progres¬ 
sive agglomerative approach presented in FigJTa] This 
heuristic approach, detailed in Algorithm [4] has 0(n) 
complexity, n being the size of the considered set of time 
series. 

The figures presented in Table [l] compare the centroid 
estimates provided by the iterated DBA, iKDBA and 
pKDTW-PWA algorithms. For the experiment. The DBA 
and iKDBA were iterated at most 20 times. Although the 
DBA and iKDBA estimates appear quite similar, the cen¬ 
troid estimates provided by the pKDTW-PWA algorithm 
is much smoother. This is a general property of the latter 
algorithm, which implements a time averaging principle 
based on the time expectation of sample occurrences. 
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TABLE 1 

Centroid estimation for the three categories of the CBF dataset. The centroid estimation is indicated as a bold black 
line superimposed on of the time series (in light red) that are averaged. The centroid estimates provided by the DBA 
algorithm are given on the left side, the estimates provided by the iKDBA algorithm in the centre and the estimates 

provided by the pKDTW-PWA algorithm on the right side. 


thus somehow allowing it to filter noisy data. Note also 
that the DBA and iKDBA estimates for the CBF data 
set are close to the results provided by the preimage 
approach (Fig(3]l. 

6 Experimentation 

The purpose of this experiment is to evaluate the effec¬ 
tiveness of the proposed time elastic averaging methods 
against a double baseline, namely k-medoid-based ap¬ 
proaches and the DBA algorithm. The first baseline allow 
us to compare centroid-based with medoid-based ap¬ 
proaches. The second baseline highlights the advantages 
we can expect from using p.d elastic kernels instead of 
indefinite kernels such as DTW in the context of time 
series averaging. DBA is also currently considered as a 
state of the art method to average a set of sequences 
consistently with DTW. 

For this purpose, we empirically evaluate the 
effectiveness of the methods using a first nearest 


centroid/medoid (1-NC) classification task on a set 
of time series derived from widely diverse fields of 
application. The task consists of representing each 
category contained in a training data set by estimating 
its medoid or centroid and then evaluating the error 
rate of a 1-NC classifier on an independent testing data 
set. Flence, the classification rule consists of assigning 
to the tested time series the category which corresponds 
to the closest (or most similar) medoid or centroid 
according to DTW or KDTW measures. 

In l25l a nice generalized k-NC task is described. The 
authors demonstrate that by selecting the appropriate 
number k of centroids (using DBA and k-means), they 
achieve, without loss, a 70% speed-up in average, com¬ 
pared to the original k-Near Neighbor task. Although, 
in general, the classification accuracies is improved 
when several centroids are used to represent the TRAIN 
datasets, our main purpose is to highlight and amplify 
the discrimination between time series averaging meth- 
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ods: this is why stick here with the 1-NC task. 

DBA and iKDBA iterative centroid methods are 
iterated at most 20 times and yield local estimates of the 
centroid. The pKDTW-PWA progressive agglomerative 
centroid method is only processed once, and hence is 
roughly 20 times faster than iKDBA and about 10 times 
faster than DBA. 

A collection of 45 data sets is used to assess the 
proposed algorithms. The collection includes synthetic 
and real data sets, as well as univariate and multivariate 
time series. These data sets are distributed as follows: 

• 42 of these data sets are available at the UCR repos¬ 
itory |[24) . Basically, we used all the data sets except 
for StarLightCurves, Non-lnvasive Fetal ECG Thoraxl 
and Non-lnvasive Fetal ECG Thorax2. Although these 
last three data sets are still tractable, their compu¬ 
tational cost is high because of their size and the 
length of the time series they contain. All the data 
sets are composed of scalar time series. 

• One data set, uWaveGestureLibrary_3D was con¬ 
structed from the uWaveGestureLibrary_X—Y—Z 
scalar data sets to compose a new set of multivariate 
(3D) time series. 

• One data set, CharTrajTT, is available at the UCI 
Repository |26) under the name Character Trajectories 
Data Set. This data set contains multivariate (3D) 
time series and is divided into two equal sized data 
sets (TRAIN and TEST) for the experiment. 

• The last data set, PWM2, which stands for Pulse 
Width Modulation |27|. was specifically defined to 
demonstrate a weakness in dynamic time warping 
(DTW) pseudo distance. This data set is composed 
of artificial scalar time series. 

For each dataset, a training subset (TRAIN) is defined 
as well as an independent testing subset (TEST). We use 
the training sets to extract single medoids or centroid 
estimates for each of the categories defined in the data 
sets. 

Furthermore, for KDTWMedoid, iKDBA and pKDTW- 
PWA, the v parameter is optimized using a leave-one-out 
(LOO) procedure carried out on the TRAIN data 
sets. The v value is selected within the discrete set 
{.05,. 1,.25, .5,1,2,5,10,25,50,100}. The value that 
minimizes the LOO classification error rate on the 
TRAIN data is then used to provide the error rates that 
are estimated on the TEST data. 

The classification results are given in Table [2] It can be 
seen from this experiment, that 

i) Centroid-based methods outperform medoid-based 
methods: DBA yields lower error rates compared 
to DTW Medoid, as do iKDBA and pKDTW-PWA 
compared to KDTW Medmd . 

ii) iKDBA outperforms DBA: under the same experi¬ 
mental conditions (maximum of 20 iterations), the 


kernalized version of the DTW measure leads to 
better classification accuracy. To some extent, this 
confirms previous results obtained for SVM classifi¬ 
cation m on such kinds of datasets, 

iii) pKDTW-PWA outperforms iKDBA: this results 
seems to show that joint averaging in the sample 
space and along the time axis improves the 
classification accuracy. As pKDTW-PWA provides a 
centroid estimation in a single agglomerative step, 
we can conjecture that this method converges faster 
toward a satisfactory centroid candidate. 

The average ranking for all five tested methods, 
which supports our preliminary conclusion, is given at 
the bottom of Table 12 

Following the study of (28l on statistical tests available 
to evaluate the significance of differences in error rate 
between classifiers over multiple data sets, we conducted 
a Friedman's significance test, a sort of non-parametric 
counterpart of the well-known ANOVA. This test ranks 
the algorithms for each data set separately, the best 
performing algorithm being given a rank of 1, the second 
best rank 2, etc. 

According to this test, the null hypothesis is rejected 
(with a P — value < 2.2e — 16). Post-hoc tests can then 
be carried out to compare pairwise algorithms using 
the Wilcoxon-Nemenyi-McDonald-Thompson test 1291 . 
For this purpose, we use the R code provided by l30l 
to generate the parallel coordinate plots and boxplots 
presented in Figj9] as well as the results reported in 
Table 0 

TABLE 3 

Significance test: Algorithm 1 is considered to be 

significantly better than Algorithm 2 according to the 
Friedman’s test if the P-value (in bold characters) 
associated with the pairwise test is less than 0.05. 


Algorithmi 

Algorithm 2 

P-value 

DBA 

DTW Medoid 

1.98e-05 

KDTW AJedoid 

DTW Medoid 

2.99e-03 

iKDBA 

DTW Medoid 

1.38e-10 

pKDTW-PWA 

DTW Medoid 

1.09e-12 

KDTW AJedoid 

DBA 

7.84e-01 

iKDBA 

DBA 

3.10e-01 

pKDTW-PWA 

DBA 

2.60e-02 

iKDBA 

KDTW Medoid 

1.90e-02 

pKDTW-PWA 

KDTW Medoid 

4.07e-04 

pKDTW-PWA 

iKDBA 

8.36e-01 


Table [3] reports the P-values for each pair of tested 
algorithms. This post-hoc analysis partially confirms 
our previous analysis of the classification results. If 
we consider that the null hypothesis is rejected when 
the P-value is less than 0.05, the post-hoc analysis 
shows that centroid-based approaches perform signif¬ 
icantly better than medoid-based approaches. Further¬ 
more, KDTW Uf , rfmrf appears to be significantly better 
than DTW Me doid- 
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TABLE 2 

Comparative study using the UCR and UCI data sets: classification error rates evaluated on the TEST data set (in %) 
obtained using the first nearest neighbour classification rule for DT\N M edoid, DBA (centroid), KDTW Medoid, iKDBA 
(centroid) and pKDTW - PWA (centroid). A single medoid/centroid extracted from the TRAIN data set represents 

each category. 


DATASET 

# Cat | L 

DT yN Medoid 

DBA 

KDTW Medoid 

iKDBA 

pKDTW-PWA 

Synthetic_C ontrol 

6 60 

3.00 

2.00 

3.33 

2.00 

4.67 

Gun Point 

2 150 

44.00 

32.00 

52.00 

25.33 

25.33 

CBF 

3 128 

7.89 

5.33 

8.11 

4.67 

5 

Face_(all) 

14 131 

25.21 

18.05 

20.53 

17.34 

17.04 

OSU_Leaf 

6 427 

64.05 

56.20 

53.31 

52.89 

54.54 

Swedish_Leaf 

15 128 

38.56 

30.08 

31.36 

30.24 

24.00 

50Words 

50 270 

48.13 

41.32 

23.40 

20.44 

19.34 

Trace 

4 275 

5.00 

7.00 

23.00 

20.00 

2.00 

Two_Patterns 

4 128 

1.83 

1.18 

1.17 

1.03 

1.12 

Wafer 

2 152 

64.23 

33.89 

43.92 

12.11 

31.96 

Face_(four) 

4 350 

12.50 

13.64 

17.05 

6.82 

10.23 

Lightning-2 

2 637 

34.43 

37.70 

29.51 

29.51 

22.95 

Lightning-7 

7319 

27.40 

27.40 

19.18 

17.81 

20.55 

ECG200 

2 96 

32.00 

28.00 

29.00 

28.00 

27.00 

Adiac 

37 176 

57.54 

52.69 

40.67 

72.12 

41.43 

Yoga 

2 426 

47.67 

47.87 

47.53 

49.80 

49.90 

Fish 

7463 

38.86 

30.29 

20.57 

19.42 

17.14 

Beef 

5 470 

60.00 

53.33 

56.67 

53.33 

53.33 

Coffee 

2 286 

57.14 

32.14 

32.14 

32.14 

21.43 

OliveOil 

4 570 

26.67 

16.67 

30 

20.00 

13.33 

CinC_ECG_torso 

4 1639 

74.71 

53.55 

66.67 

59.85 

49.64 

ChlorineConcentration 

3 166 

65.96 

68.15 

65.65 

67.94 

65.78 

DiatomSizeReduction 

4 345 

22.88 

5.88 

11.11 

5.56 

1.96 

ECGFiveDays 

2 136 

47.50 

30.20 

10.92 

19.75 

17.88 

FacesUCR 

14 131 

27.95 

18.44 

20.73 

16.63 

15.61 

Flaptics 

511092 

68.18 

64.61 

63.64 

59.74 

57.47 

InlineSkate 

7 1882 

78.55 

76.55 

78.36 

74.73 

75.82 

ItalyPowerDemand 

2 24 

31.68 

20.99 

5.05 

6.31 

6.22 

MALLAT 

811024 

6.95 

6.10 

6.87 

4.22 

3.58 

Medicallmages 

10 99 

67.76 

58.42 

58.68 

58.03 

61.71 

MoteStrain 

2 84 

15.10 

13.18 

12.70 

13.58 

9.42 

SonyAIBORobot_SurfaceII 

2 65 

26.34 

21.09 

26.230 

23.29 

25.81 

SonyAIBORobot_Surface 

2 70 

38.10 

19.47 

39.77 

15.31 

7.65 

Symbols 

6 398 

7.64 

4.42 

3.92 

3.82 

3.62 

TwoLeadECG 

2 82 

24.14 

13.17 

27.04 

17.65 

22.39 

WordsSynonyms 

251270 

70.85 

64.26 

64.26 

63.32 

58.15 

Cricket_X 

12 300 

67.69 

52.82 

61.79 

57.17 

61.28 

Cricket_Y 

12 300 

68.97 

52.82 

46.92 

44.61 

54.87 

Cricket_Z 

12 300 

73.59 

48.97 

56.67 

51.79 

59.74 

uWaveGestureLibrary_X 

8 315 

38.97 

33.08 

34.34 

32.94 

33.42 

uWaveGestureLibrary_Y 

8 315 

49.30 

44.44 

42.18 

40.31 

40.14 

uWaveGestureLibrary_Z 

8 315 

47.40 

39.25 

41.96 

40.39 

39.84 

uWaveGestureLibrary_3D 

8 315 

10.11 

6.00 

13.74 

25.65 

8.43 

CharTrajTT 3D 

20 178 

6.58 

5.18 

4.20 

11.83 

4.34 

PWM2 

3 128 

43.00 

35.00 

21.00 

20.33 

11.67 

# Best Scores 

- 

0 

8 

6 

13 

22 

# Uniquely Best Scores 

- 

0 

6 

3 

10 

20 

Average rank 

- 

4,29 

2,8 

3,16 

2.22 

1,89 


Furthermore, pKDTW-PWA is evaluated as signifi¬ 
cantly better than DBA but not significantly better than 
iKDBA in this experiment. Note also that DBA is not 


shown to perform significantly better than KDTWMedoid- 

This post-hoc analysis is summarized in FigllOl which 
shows the ranking graph for the five algorithms tested 
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Parallel coordinates plot 


Boxplots (of the differences) 




A2 - Ai A3 - Ai Ad - Ai As - Ai A3 * A2 Ad - A2 As ■ A2 Ad - A3 As - A3 As - Ad 


Fig. 9. Post hoc analysis of the Friedman’s test: (Ai) DT\N Medoid , (A 2 ) DBA, (A 3 ) KDTW Medoid, (A 4 ) iKDBA and (A 5 ) 
pKDTW-PWA. 



Fig. 10. Dominance graph for the five tested algorithms, 
according to the significance relation corresponding to 
Table [3]with a P-value threshold set at .05. 

in our experiments. 

7 Conclusion 

In this paper, we address the reputedly difficult problem 
of averaging a set of time series in the context of a time 
elastic distance measure such as Dynamic Time Warping. 
The new perspective provided by the kernelization of 
the elastic distance firstly allows us to consider the av¬ 
eraging of time series as a preimage problem. This latter 
is unfortunately an ill-posed non-convex problem that 
could suffer from combinatorial number of local optima 
when dealing with long multidimensional time series. 
Furthermore, this kind of preimage problem can only 
be resolved using gradient-free optimization procedures 
that are computationally very costly (since extensive 
functional evaluation is required). 

However, this new kernelization approach allows a 
re-interpretation of pairwise kernel alignment matrices 
as distributions of probability over alignment paths. 
Based on this re-interpretation, we propose two distinct 
algorithms, iKDBA and pKDTW-PWA, based on itera¬ 
tive and progressive agglomerative heuristic methods. 


respectively, that are developed to compute approximate 
solutions to the multi-alignment of time series. 

We present an extensive experiment carried out on 
synthetic and real data sets, mostly containing univari¬ 
ate but also some multivariate time series. Our results 
show that centroid-based methods significantly outper¬ 
form medoid-based methods in the context of a first 
nearest neighbour classification task. Most strikingly, the 
pKDTW-PWA algorithm, which integrates joint averag¬ 
ing in the sample space and along the time axis, is sig¬ 
nificantly better than the state-of-the art DBA algorithm, 
with a potentially lower computational cost. Indeed, 
the simple one-pass progressive agglomerative heuristic 
procedure is used in the pKDTW-PWA algorithm can be 
further optimized. 
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